Infectious disease surveillance serves to monitor the health of populations and identify new threats as quickly as possible after they arise (Murray & Cohen, 2017). It is often based on healthcare-based reporting systems whereby primary care providers or hospitals report numbers of individuals identified as likely cases of a disease to central authorities where these numbers are collated and reported as aggregates. During the Covid-19 pandemic in the United Kingdom, reporting of cases has mostly involved collating numbers of laboratory-identified infections with SARS-CoV-2 via self-reporting, community testing sites or hospitals.
A separate and independent system of collating information on the state of the pandemic has been run by the Office for National Statistics (ONS) via its Community Infection Survey, which conducts repeated cross-sectional surveys of Polymerase Chain Reaction (PCR) positivity indicating infection with SARS-CoV-2, as well as antibody seroprevalence via household visits (Pouwels et al., 2020). By adjusting for biases in the sampled population, the study has been used to estimate daily population-wide estimates of infection prevalence, unaffected by testing capacity or reporting behaviour that often varies by age as well as sociodemographic or other factors.
While repeated randomised cross-sectional sampling of positivity and antibodies provides utility in themselves for tracking an epidemic in real time, they can also be used for estimating epidemiological quantities by combining them with information on infection kinetics and immunological responses. Here we present a semi-mechanistic model that combines PCR positivity curves, generation interval estimates and vaccination data with ONS PCR positivity and antibody data to estimate infection incidence and its growth rate, reproduction numbers and rates of antibody waning.
We obtained the published estimates of daily prevalence of Polymerase Chain Reaction (PCR) positivity beginning on 26 April, 2020, from the ONS Community infection survey separately by nation, region, age group and variant, alongside their 95% confidence intervals, from the published spreadsheets on the ONS web site. ONS estimates of a given prevalence vary between publication dates as the internal model to calculate prevalence involves smoothing, such that new data points in the present affect the estimates of times past. We aggregated estimates of PCR positivity for a single day produced for different publication dates by calculating the central estimate and confidence limits as the medians of the different respective central estimates and confidence limits.
We developed Bayesian model to estimate epidemiological quantities from ONS PCR positivity estimates and, optionally, population level antibody prevalence estimates and vaccination coverage.
We estimated the population proportion newly infected in the population \(I(t)\) as a latent variable that is convolved with an PCR positivity curve \(p(s)\), the probability of someone infected at time \(s=0\) to test PCR positive to yield prevalence of PCR positivity \(P(t)\). \[ P(t) = \sum_{s= 0}^{t_\text{p,max}} p(s) I(t - s) \] where \(t_\text{p,max}=60\) is the maximum time modelled for which a person can stay PCR positive. We assumed each \(p(s)\) to have an independent normal prior distribution at each time \(s\) after infection with given mean and standard deviation estimates from another study (Hellewell et al., 2021). Infection incidence \(I(t)\) is distinct form the estimates of PCR positivity incidence provided by ONS alongside the prevalence estimates, as it allows for the probability of infections testing yielding negative PCR results as a function of the time since infection.
We used Gaussian Process (GP) priors to ensure smoothness of the estimates and deal with data gaps, whereby alternatively either \(I(t)\) is has a GP prior with exponential quadratic kernel. \[ \begin{aligned} I(t) &\sim \text{logit}(i_0 + i(t))\\ i(t) &\sim \text{GP}(t) \end{aligned} \] where \(i_0\) is the estimated mean of the GP, or the GP prior is applied to higher order differences, for example the growth rate such as \[ i(t) - i(t - 1) \sim \text{GP}(t) \] which implies that growth, rather than incidence, remains at an estimated mean level in the absence of data, usually leading to better real-time performance (Abbott et al., 2020). The results shown in this paper were obtained using this formulation with a GP prior on the growth rate.
We assumed that the probability of observing prevalence \(Y_{\text{P}, t}\) at time \(t\) was given by independent normal distributions with mean \(P(t)\) and standard deviation \[\sigma_{\text{P}, t} = \sigma_\text{P} + Y^\sigma_{\text{P}, t}\] where \(\sigma_\text{P}\) was estimated as part of the inference procedure and \(Y^\sigma_{\text{P}, t}\) calculated based on the reported confidence intervals in the ONS data, assuming independent normal errors. For data sets where only weekly estimates were reported by ONS, for example at the sub-regional level, we calculated average prevalence across the time period reported from our daily prevalence estimates.
Using the estimate infection incidences \(I(t)\) we estimated growth rates \(r(t)\) as \[ r(t) = \log I(t) - \log I(t - 1) \] and reproduction numbers \(R(t)\) using the renewal equation as \[ R(t) = \frac{I(t)}{\sum_{s=0}^{t_\text{g,max}} g(s) I(t - s)} \] where \(g(s)\) is the distribution of the generation interval since the time of infection (Fraser, 2007). We assumed a maximum generation interval of \(t_\text{g,max}=14\). We use re-estimated generation intervals from early in the pandemic in Singapore as reported previously (Abbott et al., 2020).
When additionally using antibodies we convolve the modelled infections \(I(t)\) as well as input data on vaccinations \(Y_{\text{V}, t}\) with distributions quantifying the delay to generating detectable antibodies following infection (by default set to 4 weeks for both infection and vaccination), yielding potentially antibody-generating time series from infection \(I^{\text{A}}\) and \(V^{\text{A}}\). We then calculate antibodies from infection as \[ A^{\text{I}}(t) = A^{\text{I}}(t - 1) + \beta I^{\text{A}}(t) (1 - A(t - 1))^k - \gamma_\text{I} A^{\text{I}}(t - 1) \] and antibodies from vaccination as \[ A^{\text{V}}(t) = A^{\text{V}}(t - 1) + \delta V^{\text{A}}(t) (1 - A(t - 1))^l - \gamma_\text{V} A^{\text{V}}(t - 1) \] with the total population proportion with antibodies given as the sum of the two, \[ A(t) = A^{\text{I}}(t) + A^{\text{V}}(t) \]
Here, the additional parameter \(\beta\) can be interpreted as proportion of new infections that does not increase the population proportion with antibodies, either due to lack of seroconversion or because they are breakthrough infections in those with existing antibodies, and parameters \(k\) and \(l\) govern the degree to which new seropostives preferentially arise in those not seropositive so far. Additional parameters \(\gamma_\text{I}\) and \(\gamma_\text{V}\) can be interpreted as rates of waning from natural infection and vaccination, respectively. This formulation implies simplifying assumptions that the rate of waning of detectable antibodies is exponential, that vaccine doses are allocated randomly amongst those with or without existing antibodies, and that the proportion of new vaccinations that lead to seroconversion \(\delta\) is constant and independent of age, vaccine use, and dose number.
The model was implemented in stan and using the cmdstanr R package. All code needed to reproduce the results shown in this paper is available at https://github.com/epiforecasts/inc2prev.
Figure 3.1: Model posteriors for England. A. Estimates of daily modelled prevalence and modelled prevalence as published by ONS. B. Estimated incidence of new infections. C. Estimated antibody prevalence and estimes as published by ONS. D. Estimated reproduction numbers.
The model was able to reproduce the daily prevalence estimates and weekly antibody prevalence estimates published by ONS with reasonable accuracy when run until 15 November 2021 (Figure 3.1). The peaks of the corresponding incidence curve are earlier, higher and sharper. Estimated reproduction numbers highlight some key phases of the UK pandemic between April 2020 and November 2021, in particular rapid increases due to emergence of the Alpha variant in December followed by a period of low transmission during lockdown until March 2021, and rapid spread of the Delta variant in May-July 2021 followed by a period of relatively steady transmission.
| Parameter | Description | Estimate (90% CI) |
|---|---|---|
| beta | Proportion infected that seroconvert | 0.69–0.95 |
| gamma (infection) | Antibody waning following infection | 0.00065–0.0028 |
| gamma (vaccination) | Antibody waning following vaccination | 0.00098–0.0027 |
| delta | Proportion vaccinated that seroconvert | 0.95–0.99 |
| k | Efficacy adjustment of immunity following infection | 0.21–1.4 |
| l | Efficacy adjustment of immunity following vaccination | 0.41–0.59 |
Posterior estimates of recovered biological parameters are shown in Table @ref{tab:prams-table}. Some of the parameter estimates show high levels of correlation suggesting issues of identifiability (Figure 3.2).